Variant Discovery ◾ 119
GCF_009858895.2_ASM985889v3_genomic.fna.fai
GCF_009858895.2_ASM985889v3_genomic.fna.pac
GCF_009858895.2_ASM985889v3_genomic.fna.sa
4.2.1.1.4 Aligning short reads of FASTQ files to the reference genome
Since we have downloaded the FASTQ files and the sequence of the reference genome, it
is time to map the reads to the reference genome and to create a SAM file for each sample.
Remember that since the reads are paired end, there are two files (forward and reverse) for
each run. This time we will use the following syntax of “bwa mem” command:
bwa mem -M -t 4 \
-R “@RG\tID:….\tSM:…..” \
ReferenceSequence \
file1.fastq file2.fastq > output.sam 2> output.log
The “-M” option is to mark shorter split hits as secondary, “-t” option specifies the num-
ber of threads to speed up the mapping, and “-R” to add a header line. For variant calling,
we may need to add or create a read group (RG) header to hold the sample information.
As input, we will provide the FASTA reference sequence and the forward FASTQ file and
reverse FASTQ file in the case of paired-end reads. The output can be directed to a SAM
file.
The above form is suitable for a single run. However, we may have multiple samples and
mapping the reads of a single-sample files at a time may be tedious. Instead, we can use a
bash script with a loop to align all runs. The following script creates the subdirectory “sam”
and then aligns reads of each sample to the reference genome and outputs a SAM file and a
log file. When you run the script, make sure that the main project directory is the working
directory.
mkdir sam
cd fastq
for i in $(ls *.fastq | rev | cut -c 9- | rev | uniq);
do
bwa mem -M -t 4 \
-R “@RG\tID:${i}\tSM:${i}” \
../ref/GCF_009858895.2_ASM985889v3_genomic.fna \
${i}_1.fastq ${i}_2.fastq > \
../sam/${i}.sam 2> ../sam/${i}.log;
done
cd ..
To make sure that the read mapping process has been completed successfully, you can dis-
play the content of the “sam” directory with “ls sam”. You can also display the content of
any of the SAM files by using “less -S SamFileName”.